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ABSTRACT 

Traditional recommendation systems make recommendations 
based solely on the customer's past purchases, product rat- 
ings and demographic data without considering the prof- 
itability the items being recommended. In this work we 
study the question of how a vendor can directly incorpo- 
rate the profitability of items into its recommender so as to 
maximize its expected profit while still providing accurate 
recommendations. Our approach uses the output of any 
traditional recommender system and adjust them accord- 
ing to item profitabilities. Our approach is parameterized 
so the vendor can control how much the recommendation 
incorporating profits can deviate from the traditional rec- 
ommendation. We study our approach under two settings 
and show that it achieves approximately 22% more profit 
than traditional recommendations. 

1. INTRODUCTION 

Recommendation Systems are important tools for major 
companies like Amazon, Netflix and Pandora. Ruse a cus- 
tomer's demographic data, past purchases and past product 
ratings to predict how the customer will rate new products 
[U 1101 [H [12]. Recommender systems have been shown to 
help customers become aware of new products of interest, 
increases sales and encourages customers to return to the 
business for future purchases O [M] . Designing recommen- 
dation systems that can accurately predict customer ratings 
has generated much research and interest both in the aca- 
demic and business communities. The Netflix prize was a 
manifestation of this interest [11]. However, the majority of 
the work on recommender systems has not explicitly con- 
sidered how the profitability of products could incorporated 
into the recommendations. An article published in Knowl- 
edge@Wharton claims that the actual Netfiix recommen- 
dation system modifies its ratings to encourage consumers 
to order the more obscure movies which are presumably 
cheaper for Netfiix to supply than major blockbusters [17]. 
While Netflix does not, publicly reveal that it uses such 
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methods, it seems natural for a business to incorporate the 
profitability of products into its recommendations. Recom- 
mendations are, in some sense, extremely targeted advertis- 
ing disguised as helpful suggestions, and the explicit goal of 
advertising is to increase profit for the company. 

In this paper, we study the question of how a vendor 
should incorporate the profitability of items into its rec- 
ommendations so as to maximize its expected profits. A 
naive approach is to give the most profitable items the high- 
est recommendations. Then these items would presumably 
be bought more often and the business would make more 
money. However this tactic has some obvious flaws. While 
the customer may initially follow the vendor's recommenda- 
tion, she may find that she does not like the items as much 
as the vendor predicted. After only a few such experiences, 
she would realize that the vendor's recommendations do not 
accurately reflect her tastes. In the best case for the vendor, 
the customer would ignore the vendor's recommendations 
and continue her natural purchasing behavior while in the 
worst case she would lose trust not only in the vendor's rec- 
ommendations but also in the vendor as a whole and take 
her business elsewhere. Thus incorporating the profitability 
of items into recommendations must be done carefully so 
that customer's trust is not compromised. 

A reasonable assumption is that as long as the vendor con- 
sistently presents recommendations that are similar enough 
to the customer's own ratings, the customer will maintain a 
high level of trust in the accuracy of the vendor's recommen- 
dations. For this reason, we use an established similarity 
measure as a measure of trust. We assume that the vendor 
has access to a vector c giving the consumer's true ratings 
for items. The vendor's objective is to present a recommen- 
dation vector f to the customer which is within a certain 
threshold of similarity to c, and maximizes the vendor's ex- 
pected profit (by incorporating the profitability of items into 
the recommendation). Section [2] gives details of the model. 

One question that might arise is how the vendor can de- 
termine the customer's true ratings (i.e c ) if the ratings are 
for products that the consumer has not yet rated herself. 
We believe there is a large amount of research and activity 
in developing highly accurate recommendation systems that 
solve this problem and assume the predictions from these 
systems for well established customers are good approxima- 
tions to the customer's true ratings. 

The idea that the customer will maintain high trust as 
long as the vendor's recommendations are within a threshold 
of similarity of her own ratings has empirical support. Hill 
et al. showed that users asked to rate the same item at 



difFerent times supply different ratings [?]■ K there is some 
natural variability in the ratings that users supply to the 
vendor, then slight differences between the predicted ratings 
and the actual customer ratings should be insignificant to 
the customer. 

Chen et al. also considered using the profitability of items 
explicitly in recommender systems [4], but they do not ex- 
plicitly require a level of accuracy in their system. 

The outline of our paper is: section[2]define our model and 
problem, section[3]describes the similarity measure and gives 
justification for its use as a measure of customer trust and 
section U describes our approach for maximizing expected 
profit in different scenarios. 

2. MODEL 

Let n be the number of items being sold by the vendor. Let 
c and r be the vectors of length n where the ith components 
denoted c^and ri gives a rating for item i. All items are rated 
using numbers between zero and some maximum rating m. 

We focus on the scenario where the vendor is interacting 
with an established customer who the vendor would like to 
continue to do business with in the long term. We assume 
the vendor uses a recommendation system which make good 
predictions about how such a customer rates items and that 
vector c gives these predictions Q The vendor presents the 
customer with recommendation vector r to help her decide 
which item to purchase. The customer has a certain level of 
trust in the accuracy of the vendor's recommendation which 
she has developed from experience with the vendor's past 
recommendations. When choosing an item to purchase, the 
customer considers r based on how much trust she has in 
the vendor's recommendations. 

The exact influence that a customer's level of trust in the 
vendor's recommendations has on her purchasing behavior 
is too complex to model precisely. For this reason, we make 
the following simplifying assumptions to allow for a model 
of that behavior: First we assume that the consumer's trust 
is closely tied to how similar the vendor's recommendations 
are to the ratings she would give items. Second we assume 
that as long the similarity threshold is consistently exceeded 
by the vendor, the consumer will use the recommendations 
when deciding what items to purchase. Specifically we as- 
sume there is a function T{r) that assigns a scalar value to 
the notion of similarity between r and c where higher values 
indicate greater similarity. Second, we assume that if for ev- 
ery r that the vendor presents to the customer, T{r) meets or 
exceeds some threshold value r, then the customer's trust 
in the recommendation system will remain at a constant, 
significantly high level. Finally, we assume that if the cus- 
tomer's trust remains at this level, her purchasing decision 
will be solely a function of r. The customer's level of trust 
and her subsequent purchasing behavior is unknown if the 
vendor presents f such that T{r) < r. 

The intuition behind these assumptions is that, as long 
as the vendor recommendation consistently predicts ratings 
that are similar enough to how the customer would rate the 
items, the customer will maintain a certain level of trust in 
the accuracy of the recommendation system. As a result, 
she will use the vendor's recommendation as information 
when deciding which items to purchase. However if r is too 

^The customer does not necessarily know all the entries of 
c herself as she has not purchased all items. 



far from c, the customer loses trust the vendor's recommen- 
dations and no longer considers them when deciding which 
item to buy. In that case, her purchasing behavior is unclear. 

The vendor's main goal is to maximize profit. However, 
to maintain customer trust he is required to present vector 
r such that T{r) > t for some constant r. Denote <^(r) 
be a vector valued function whose ith component gives the 
probability that the customer will purchase item i item at 
a given time step. The customer, at any step, can purchase 
zero or more items, so the components of <^(f) need not sum 
to one. Let phe the profit vector whose ith component gives 
the profit received when when item i is purchased. The ven- 
dor's expected profit is given hy Ep — p - <^(r). Formally our 
problem is to maximize the vendor's expected profit while 
maintaining a level r of trust with the customer: 

Max p ■ ip(r) s.t. T{r) > r (1) 

3. SIMILARITY MEASURES FOR TRUST 

In this section we argue that the Dice coeffictent, which 
measures similarity between two vectors, is an appropriate 
measure of consumer trust and briefiy discuss why some of 
the other common singularity measures are lacking. 
Dice coefficient. We adopt the Dice coejficient given in 
equation [2] to measure trust T{f). Note that the Dice coef- 
ficient is a popular measure which has previously been used 
to measure recommendation accuracy [H [SI US] • 

Above \ \x\\ — \f^~x\ denotes the length of vector x. Nor- 
mally the Dice coefficient is denoted as a function of the 
two vectors whose similarity is being measured but here we 
use we denote it as only a function of r) to emphasize the 
fact that c is constant known to the vendor. Let Q denote 
the angle between c and r. Another way of stating the Dice 
coefficient and Equation [2] is, 

i?zce(r-)=cosW.^»L (3) 

We now list some properties of the Dice coefficient that 
makes it a reasonable function to measure trust. See the 
appendix for proofs. 

Property 3.1. Dice(r) is always between zero and one. 
Diceif) — 1, if and only if Ci — ri for every item i. Dice{r) — 
0, if and only if ri =0 on all items i such that Ci > 0. 

Thus the Dice{r) is one only when r is in complete agree- 
ment with c, and it is zero only when r disagrees with c on 
all relevant items. 

Jaccard measure. The Jaccard similarity measure, given 
below, behaves similar to the Dice coefficient and is used 
widely in information retrieval and data mining [161 [2]. 

Jac{r) — (c ■ r)/(||c||^ + — c - r) 

It is another appropriate measure of consumer trust and all 
results extend to the setting where T(r) = Jac(r). 

Cosine measure. The cosine similarity measure, given in 
equation [4j is equal to the cosine of 60 It is always between 

^Recall that 6 is the angle between c and r . 



zero and one as no item is rated less than zero. 



Cos{r) = cos(6l) = 



(4) 



The cosine measure is not influenced by difference in the 
lengths of r and c and this is the main reason it seems un- 
suitable for measuring trust as demonstrated in the following 
example. 

Example. Suppose the items are rated between 1 and 5. 
Consider a picky customer who rates all items low and a 
vendor that recommends all items high, i.e d = 1 and ri = 5 
for all i. The cosine measure will be to one because = 
indicating that the picky customer has high trust for the 
vendor's recommendation, which is surly not the case. 

Mean squared error and distance measures. The 

mean squared error (MSE) measures the average dissimi- 
larity between f and c. To measure similarity we could use 
1-MSE, Equation [5] which is always between zero and one. 
Note that m denote the maximum possible rating. 



MSE(r) 



T.i{(ci/m-ri/m)f 



(5) 



The 1-MSE measure gives equal credit for agreements on 
items the customer dislikes as on agreements on items she 
prefers and this makes it unsuitable for measuring the trust 
as demonstrated by the example below. The This example 
also applies to other distance based similarity measures such 
as Euclidean distance and Manhattan distance. 
Example Suppose the items are rated between and 5. Let 
c = [5, 5, 5, 1, 1, 1, ... , 1] which represents a customer who 
rates a few items very high but dislikes most items. Con- 
sider the following recommendation r = [1, 1, 1, . . . , 1, 5, 5, 5] 
where the vendor gives the highest ratings to a few items 
that the customer dislikes and gives the low ratings to all 
other items including the items the customer prefers. MSE 
is Q{l/n) so that 1-MSE approaches 1 as the number of 
items n gets large. 

4. PROFIT MAXIMIZATION 

Using the Dice similarity coefficient defined in Section [3] 
as the trust function the vendor's optimization problem is. 



Max £'p(f) = p ■ (p{r) s.t. Dice{r) = " ' ^ > r 

(6) 

We outline a general approach for solving Equation |6] and 
then apply our technique on two different objective functions 
obtained by alternative definitions of (^(r). We analyze how 
much profit the vendor gains by presenting the customer 
with recommendation r rather than c. 

4.1 General Approach 

Concepts from vector calculus give us a general approach 
for solving the vendor's maximization problem stated in 
Equation [1] Adding 1/r to both sides of the Dice constraint 
above and simplifying reveals that it is equivalent to. 
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(7) 



As c is a constant for our setting, the feasible r for our 
problem lie in a region enclosed by an n-sphere with radius 

^ {-^ — 1) Ei which we refer to as the Dice sphere. 



The general approach to solving maximization problem of 
Equation [6] involves two parts. The first is to determine if 
there are any local maxima that lie strictly inside the Dice 
sphere i.e. which satisfy Dice{r,c) > t. The second part is 
to find the vector f that maximizes expected profit over all 
vectors on the surface of the Dice sphere, i.e. which satisfy 
Diceir, c) = t. The largest of the local maxima in the sphere 
and the maximum on the surface is the global maximum. 

The gradient of the objective function is zero at each lo- 
cal maximum inside the Dice sphere. Thus all solutions to 
V-Ep = are candidate vectors. Let f{,T^, ...,fk be the list 
of all vectors that satisfy this property. While these vectors 
could be local minima or saddle points instead of local max- 
ima, it is not necessary to distinguish between them. It is 
only necessary to find the greatest value of Epifi) for all i 
and compare it to the maximum of all vectors on the surface 
of the sphere. Let rXn denote the maxima vector inside the 
Dice sphere. 

The maximum vector on the surface of the Dice sphere 
can be found using the method of Lagrange multipliers. 
The maximum valued vector rl will satisfy V Dice{f's,c) = 
WEp{fs) and Diceifl, c) = t. The maximum of fl and rZn 
is the solution to the optimization problem [3l 115]. 



4.2 Simple probability function 

Consider a scenario where the customer can purchase zero 
or more items at each time step where the probability that 
the customer purchases item i is independent of the vendor's 
ratings for other items. Here is a simple way to define the 
probability function which satisfies this assumption. 



^>1 = P,^,...,^) 

\m m mJ 



(8) 



Recall that m is the highest possible rating for an item. 
With this definition of 0, the probability that a customer 
purchases an item is linearly proportional to the vendor's 
rating of that item and the vendor's expected profit is 



Ep(r) =p- 0{r) = — Vpi 

m ^ — ^ 



(9) 



If prices are all greater than zero, V-Ep = (^,...,^) 7^ 
so there are no local maxima inside the sphere. Thus 
we proceed to finding the maximum be on the surface of 
the Dice sphere. Using Equation [7] we have V Dice(f,c) = 
(2(ri — ^), 2(r„ — ^)) , so the maxima on the Dice sphere 
surface must satisfy the following system of equations, 

^ = 2A(r, - Ci/r) for all i 

and E.(n-c./r)2 = (Jj-l)E.c? 

where the last equation requires r to lie within the Dice 
sphere. Solving the first set of equations we obtain that 

Pi 



2mA r 



(10) 



Substituting in the value of Vi into Equation [7] we get A = 
The final solution is obtained by plugging 



A into Equation [TOl 



Ti = Pi 



(77 l)EjCj Ci_ 



(11) 



Profit Gains. By presenting the customer with recommen- 
dation r derived in Equation [TT] the vendor earns expected 



profit Ep{r) 



PtJi^M - 1) E, E, P? + P.c,:/r 



which is simplified via the Cauchy-Schwarz inequality to 0. 

1 , M E^PiCi 



(12) 



The expected profit from c is Epic) — {'^.piCi)/m. The 
vendor's profit gain from presenting r rather than c is, 



Ep{r)-Ep{c) 



1 > 2(l/r - 1). 



For instance, if the vendor presents recommendation vectors 
that are within similarity threshold r = .9 to c, then in 
expectation he earns at least 2(10/9 — 1) > 22% more profit 
presenting r rather than c. 

4.3 Simple distribution 

Now we consider the scenario where at each time step the 
customer purchases only one item but chooses which item 
to purchase based on how its rating compares to ratings of 
other items. A simple way to model this is to assume that 
item i is purchased with probability Vi / J]]^. r j . Thus the 
customer first scales each item by the sum of the vendor's 
ratings, and then chooses uniformly among all offered items. 
The expected profit will be 



Ep{r) = (p{r) = ^ 



(13) 



The local maxima inside the Dice sphere occur where the 
gradient of the expected profit is zero. The gradient is 



VE^ 



3 ^ 



riPi 



• E, rj 



For V-Ep = 0, it must be the case that r-j = for all 
i. Thus if there are at least two items with different ratings 
then there are no local maxima inside the Dice sphere. 

We move on to find local maxima on the surface of the 
Dice sphere. If we try to use the method of Lagrange Mul- 
tipliers as before we end up having to solve the following 
system of equations for variables A and r,: for all i: 



Pi Ej Tj 



= 2A(r, - — ) for all i and Dice{r) > r (14) 



Unfortunately we do not know how to find a solution for 
this as the first set of equations involves ri for all i. Recall 
that in section 14.21 the corresponding equations for finding 
the maxima on the Dice surface were each a function of 
only one Vi. This lead us to seek an alternative approach. 
We will reduce solving the optimization problem under the 
definition of Ep given in Equation [13] to solving a series of 
simpler optimization problems on which we can effortlessly 
apply the method of Lagrange Multiplier. To do so, first 
consider the decision version of our problem; Does there 
exist a f such that Ep(r) >V and Dice{r) >t? 

Under Equation 1131 having Epif) > V is equivalent to 
having Ei(P! ~ V)ri > 0. Thus the decision version of our 



^ The Cauchy-Schwarz inequality is E Pi E 

(EiP^cO^ 



problem is equivalent to solving the following maximization 
problem and checking that its solution has Vi > for all i, 

Max Ep{r} = ^{p^ ~ V)r\ s.t. Dice{r) > t (15) 

i 

Equation [15] can be solved using the general approach out- 
lined in Section The gradient VEp{r) = iff - 1/ = 
for all i. Thus as long some item is not priced V , there are 
no local maxima for Equation [TJ] inside the Dice sphere Q. 

To find the maxima on the surface of the sphere, we solve 
the following system of equations, 

Pi — V — 2X{ri — Ci/T) for all i and Dice{r) > r (16) 

Note that Equation [16] differ from Equation [10] only by con- 
stants so the solution derived in Section [4.21 shifted by con- 
stants is a solution for Equation 1161 We get that, 



Pr-V 

2A 



— where 
r 



(i/^^-i)E,' 



If for all j, ri > 0, we return a "yes" for the solution of the 
decision problem and otherwise we return "no". 

To find an a solution for the original optimization prob- 
lem which is arbitrarily close to optimal, we can do bi- 
nary search along a bounded interval of possible values of 
Ep, checking the existences of solutions using decision ver- 
sion algorithm described above. Let Knax = max^pi. The 
initial binary search interval can be set to [0, Vmax] since 
Ep(r) < 'Knax as the customer purchases one item per time 
step. Each binary search step reduces the search interval 
by half and doing more and more binary search steps brings 
us closer and closer to the optimal solution for the opti- 
mization problem. Let 5 < 1. A solution which is within 
distance Knax5 of the optimal can be found by performing 
log( ) binary search steps. For example, let r* denote 
the optimal solution. With S — 1/V^nax, we can obtain an 
approximate solution fa such that Ep(fa) -|- 1 > Ep{fi,) in 
log(iipL) ^ log{V^ax) = 0(log Vmax) binary search steps. 
Profit Gains. Thus we are able to find a near optimal so- 
lution to the optimization problem with Ep{f) as defined in 
Equation [TH] by solving a series of optimization problems of 
the kind solved in Section|4]2] The profit gains analysis from 
section 14.21 extends to the last "yes" solution obtained for a 
decision problem. However as this "yes" solution is near op- 
timal for the original optimization problem, the profit gains 
will be close to that from section [ 



5. CONCLUSION 

Traditional recommendation systems do not incorporate 
the profitability of items into its recommendations. In this 
work we propose one of the first methods that balances max;- 
imizing vendor's profits while providing accurate recommen- 
dations . Our approach can supplement any traditional rec- 
ommendation system and allows the vendor to control how 
much the profit based recommendation should deviate from 
the traditional recommendation. We study our approach 
under two settings and show that the vendor can raise ap- 
proximately 22% more profit by allowing a 10% deviation. 
Our work is a starting point and we hope it will simulate 
new research on incorporating profits into recommendation 
systems. 

^If all item are priced V , all r yield expected profit V and 
we could pick any f which lies inside the Dice sphere. 
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Properties of the Dice coefficient 

Proof. (Proof of Propertv ISTTjl 

Diceif) > because no item is rated less than zero. Thus 
the numerator of Equation[2]is always positive. Now suppose 
that Djce(r) > 1. By Equation (2] this implies that > 
Ci — 2riCi + r1 — ~ which is a contradiction. 

Thus Dice{r) < 1. 

Suppose that = Ci for every i. Then using Equation [2] 
Dice{r) = 1. To prove the other direction suppose that 
Dice{r) = 1. The formulation of the Dice coefficient given 
in Equation |3] can be used to show that = d for all i. 
Note that < cos{0) < 1 as no item is rated less than 
zero. The second term of Equation |3] is non-negative by 
definition and the following proof by contradiction shows 
it is at most 1: suppose 2||c]|||r||/(||c|p -I- ||r]|'^) > 1 then 
2||5llll^tl > llcll^ + ||r||2 implying that > (||cl| - \\r\\)^ 
which is false. Thus if Diceif) ~ 1 both terms of Equation 
|3]must be 1. As cos(6') — 1 the angle between cand r is zero 
and the second term being 1 implies that (||c| | — j) = i.e 
that f and c have equal length. Together this implies that 
ri — Ci for all i. 

Finally note that when Dice{r) — 0, the numerator of Equa- 
tion [2] is zero which implies that ri must be zero for each i 
such that Ci > 0. □ 



